Torchair graph812 cov #2337

NicholasTao · 2025-08-12T08:59:54Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@0ca2393

Signed-off-by: p00465316 <[email protected]>

Signed-off-by: taoyuxiang <[email protected]>

gemini-code-assist

Code Review

This pull request adds support for Qwen models with torchair, including new model files, tests, and configuration changes. My review focuses on several critical issues. I've identified a typo in a test patch target that would cause it to fail. More importantly, I found two critical bugs in the rotary_embedding implementation: one causing a tensor shape mismatch due to incorrect unsqueeze operations, and another related to an incorrect monkey-patch target that would prevent the new rotary embedding logic from being used. These issues need to be addressed to ensure the correctness of the new functionality.

gemini-code-assist · 2025-08-12T09:01:15Z

tests/e2e/multicard/test_torchair_graph_mode.py

+    def stubbed_get_state(ep_size, with_prefill, is_deepseek_v3_r1):
+        return _get_fused_moe_state(16, with_prefill, is_deepseek_v3_r1)
+
+    with patch('mockif._get_fused_moe_state', side_effect=stubbed_get_state):


The target string for patch seems incorrect. 'mockif._get_fused_moe_state' will likely cause a ModuleNotFoundError because mockif is not a valid module. Based on the imports, you probably intended to patch _get_fused_moe_state from vllm_ascend.ascend_forward_context.

Suggested change

with patch('mockif._get_fused_moe_state', side_effect=stubbed_get_state):

with patch('vllm_ascend.ascend_forward_context._get_fused_moe_state', side_effect=stubbed_get_state):

gemini-code-assist · 2025-08-12T09:01:15Z

vllm_ascend/ops/rotary_embedding.py

+    cos = cos.unsqueeze(-2).unsqueeze(-2)
+    sin = sin.unsqueeze(-2).unsqueeze(-2)
+
+    query = query.unsqueeze(1)
+    key = key.unsqueeze(1)
+
+    q_embed, k_embed = torch_npu.npu_apply_rotary_pos_emb(query, key, cos, sin)


The unsqueeze(1) calls on query and key introduce an extra dimension, which leads to an incorrect output shape from rope_forward. The attention mechanism expects a tensor of shape (num_tokens, num_heads * head_size), but this implementation will produce a tensor with an extra dimension of size 1, e.g., (num_tokens, 1, num_heads * head_size). This will cause a shape mismatch error in the subsequent attention computation. The unsqueeze calls on cos and sin should also be adjusted to a single unsqueeze to ensure correct broadcasting.

Suggested change

cos = cos.unsqueeze(-2).unsqueeze(-2)

sin = sin.unsqueeze(-2).unsqueeze(-2)

query = query.unsqueeze(1)

key = key.unsqueeze(1)

q_embed, k_embed = torch_npu.npu_apply_rotary_pos_emb(query, key, cos, sin)

cos = cos.unsqueeze(-2)

sin = sin.unsqueeze(-2)

q_embed, k_embed = torch_npu.npu_apply_rotary_pos_emb(query, key, cos, sin)

gemini-code-assist · 2025-08-12T09:01:16Z

vllm_ascend/ops/rotary_embedding.py

@@ -283,7 +378,8 @@ def deepseek_rope_init_func(
                       device="npu")


-RotaryEmbedding.forward_oot = rope_forward_oot
+RotaryEmbedding.__init__ = qwen_rope_init_func
+RotaryEmbedding.forward_oot = rope_forward


You are monkey-patching RotaryEmbedding.forward_oot, but RotaryEmbedding instances are called via their forward method. To correctly override the rotary embedding logic, you should patch RotaryEmbedding.forward. Using forward_oot will not have the desired effect as it's not the standard method called.

Suggested change

RotaryEmbedding.forward_oot = rope_forward

RotaryEmbedding.forward = rope_forward

github-actions · 2025-08-12T09:04:40Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

codecov · 2025-08-12T16:08:14Z

Codecov Report

❌ Patch coverage is 77.67442% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.77%. Comparing base (c611291) to head (f759f5e).
⚠️ Report is 27 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/models/qwen3_moe.py	31.74%	43 Missing ⚠️
vllm_ascend/ops/rotary_embedding.py	89.58%	5 Missing ⚠️

❌ Your patch check has failed because the patch coverage (77.67%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2337      +/-   ##
==========================================
- Coverage   76.09%   75.77%   -0.32%     
==========================================
  Files         114      118       +4     
  Lines       13103    13731     +628     
==========================================
+ Hits         9971    10405     +434     
- Misses       3132     3326     +194

Flag	Coverage Δ
unittests	`75.77% <77.67%> (-0.32%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

2. add rope_forward ut Signed-off-by: taoyuxiang <[email protected]>

github-actions · 2025-08-19T02:28:41Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

p00465316 and others added 11 commits August 8, 2025 17:39

qwen3_moe/qwen25 support torchair graph

d2374c0

Signed-off-by: p00465316 <[email protected]>

fix e2e

d4a3fbe

Signed-off-by: p00465316 <[email protected]>

fix codecov

9747c93

Signed-off-by: taoyuxiang <[email protected]>

add qwen3_moe test_constant_input_normalization

8a6d5ee

Signed-off-by: taoyuxiang <[email protected]>

fix test_set_cos_sin_cache

7a88f24

Signed-off-by: taoyuxiang <[email protected]>

fix qwen3_moe normalize_qkv

7d701cd

Signed-off-by: taoyuxiang <[email protected]>

fix yapf

b7ec3b8

Signed-off-by: taoyuxiang <[email protected]>

fix ut

9faa4bf

Signed-off-by: taoyuxiang <[email protected]>

fix TestSetCosSinCache

0ea12cb

Signed-off-by: taoyuxiang <[email protected]>

fix qwen graph

54dc15e

Signed-off-by: taoyuxiang <[email protected]>

e2e rollback to Qwen2.5-0.5B

ca1ba22

Signed-off-by: taoyuxiang <[email protected]>

github-actions bot added module:tests module:ops module:core labels Aug 12, 2025

gemini-code-assist bot reviewed Aug 12, 2025

View reviewed changes

NicholasTao force-pushed the torchair_graph812_cov branch 2 times, most recently from d2462c9 to 0739169 Compare August 12, 2025 15:52

NicholasTao force-pushed the torchair_graph812_cov branch 3 times, most recently from 59a4e7c to f759f5e Compare August 13, 2025 03:10

1. e2e qwen graph mock ep 16

72cdec2

2. add rope_forward ut Signed-off-by: taoyuxiang <[email protected]>

NicholasTao force-pushed the torchair_graph812_cov branch from bd35c24 to 72cdec2 Compare August 14, 2025 02:26

github-actions bot added the merge-conflicts label Aug 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torchair graph812 cov #2337

Torchair graph812 cov #2337

NicholasTao commented Aug 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 12, 2025

Uh oh!

gemini-code-assist bot Aug 12, 2025

Uh oh!

gemini-code-assist bot Aug 12, 2025

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

codecov bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Uh oh!

	with patch('mockif._get_fused_moe_state', side_effect=stubbed_get_state):
	with patch('vllm_ascend.ascend_forward_context._get_fused_moe_state', side_effect=stubbed_get_state):

	RotaryEmbedding.forward_oot = rope_forward
	RotaryEmbedding.forward = rope_forward

Torchair graph812 cov #2337

Are you sure you want to change the base?

Torchair graph812 cov #2337

Conversation

NicholasTao commented Aug 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

codecov bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Uh oh!

NicholasTao commented Aug 12, 2025 •

edited by github-actions bot

Loading

codecov bot commented Aug 12, 2025 •

edited

Loading